Power Iteration Clustering

نویسندگان

  • Frank Lin
  • William W. Cohen
چکیده

We present a simple and scalable graph clustering method called power iteration clustering (PIC). PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data. This embedding turns out to be an effective cluster indicator, consistently outperforming widely used spectral methods such as NCut on real datasets. PIC is very fast on large datasets, running over 1,000 times faster than an NCut implementation based on the state-of-the-art IRAM eigenvector computation technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelized Power Iteration Clustering of Nouns and Verbs using Subject-Verb and Verb-Object Pairs

We explore the use of Power Iteration Clustering for large-scale clustering of nouns and verbs, using subject-verb-object triple relations. We have implemented a parallelized version of PIC, which can efficiently handle clustering of hundreds of thousands of sparse “documents.” We tested our implementation on clustering of over 1 million noun phrases, and over 650K verb phrases. We also evaluat...

متن کامل

Parallel Power Iteration Clustering for Big Data using MapReduce in Hadoop

In today’s life Distributed Data Mining is most popular topic in research area because as data are increasing in day to day life there are so many problems occurs to handle them and there are also a solutions for that but still they are not as per expectation, still there are some issue already there in the Distributed Data Mining, among them mainly we are focus in this papers that about reduci...

متن کامل

Client Based Power Iteration Clustering Algorithm to Reduce Dimensionality in Big Data

Clustering is a group of objects that are similar among themselves but dissimilar to objects in other clusters. Clustering large dataset is a challenging task and the need for increase in scalability and performance formulates it to use parallelism. Though the use of Big Data has become very essential, analyzing it is demanding. This paper presents the (pC-PIC) parallel Client based Power Itera...

متن کامل

GPIC - GPU Power Iteration Cluster

This work presents a new clustering algorithm, the GPIC, a Graphics Processing Unit (GPU) accelerated algorithm for Power Iteration Clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintining the algorith original properties. The proposed method was compared against the serial and parallel Spark implementation, achieving a...

متن کامل

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010